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In many applications, ranging from character recognition to signal de- 
tection to automatic target identification, the problem of signal classification is 
of interest. Often, for example, a signal is known to belong to one of a family of 
sets C \ , . . . , C n and the goal is to classify the signal according to the set to which 
it belongs. The main purpose of this thesis is to show that under certain condi- 
tions placed on the sets, the theory of uniform approximation can be applied to 
solve this problem. Specifically, if we assume that sets Cj are compact subsets of 
a normed linear space, several approaches using the Stone- Weierstrass theorem 
give us a specific structure for classification. This structure is a single hidden 
layer feedforward neural network. We then discuss the functions which comprise 
the elements of this neural network and give an example of an application. 
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1. Signal Classification 



Signal classification is, quite simply, the process of examining a signal 
and determining a class, or group, from which it came. Humans perform many 
instances of signal classification each day, often without even knowing it. For 
example, one might read a signature (the signal) carefully to determine the 
author (the class). This might be a process that would be extremely hard for a 
computer to perform. 

There are numerous applications in military, civilian, and academic prob- 
lems that require the use of the field of signal classification. It would be fruitless 
to attempt to compile an exhaustive list of applications, so we will state and 
develop a few problems here in which the theory of signal classification plays 
an important role in the solution. 

Automatic Target Recognition 

The field of automatic target recognition is extremely important, primar- 
ily in the area of the military. The main purpose of automatic target recognition 
is the use of computer processing to detect and recognize signatures in sensor 
data [1]. These targets are most often in a cluttered environment and frequently 
in hostile territory. They may include such things as aircraft, missiles, tanks, 
or warships. The clutter in their background may come from temperature or 
pressure disturbances, atmospheric variations, topographical objects, or even 
other targets. 

There are typically two steps to an automatic target recognition problem: 
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detection and identification. Usually some relatively fast and coarse method is 
used to detect an object from background noise, and a slower more precise 
method is used to identify it. Typical features that are required to be extracted 
from the target when it is detected often include its position, its size and shape, 
and its speed. 

In order to measure these quantities, an automatic target recognition 
system will possess sensors such as high resolution cameras and complex radar 
arrays. These sensors will obtain data and send it to the processing portion of 
the system. The system will then determine first whether a target even exists 
and then attempt to identify the target. 

It is immediately very clear that the second portion of the problem (the 
identification) is basically a pure classification problem. Once it is determined 
that a tank is found, for example, it is important to be able to quickly determine 
whether the tank is friendly or hostile. An automatic recognition system thus 
frequently consists of several modules, one of which is the classifier. 

Usually the classifier is designed with the assumption that each input, 
once found, belongs to only one of the classes. This assumption will become 
important later because it will allow us to make use of some well-known math- 
ematical theorems in order to determine when classification may be possible. 

Pattern Recognition 

A second application of the theory of signal classification is in the field of 
pattern recognition. This is an extremely broad field, concerning a wide range 
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of problems of practical interest, including character recognition and speech 
identification. 

One classical application is the reading of characters written either by 
hand or by machine. This application has a wide range of uses in government 
and commercial industry. For example, computers used by the post office are 
able to indentify machine-written letters on envelopes in order to sort them. 
Another important area deals with financial institutions. In these cases, the 
problem typically deals with classifying an input character into one of the thirty- 
six classes formed by the characters in the alphabet and the ten numerals. The 
area of printing is usually prescribed, so it is easy to locate and segment the 
characters. Some form of sampling is usually done, and then an algorithm 
determines the character. 

There are also several problems in the field of speech recognition that rely 
heavily on classification theory. These problems include the following: speaker 
identification, speaker verification, and isolated word recognition [16]. In a 
speaker verification system, the number of classes relates to the number of 
different individuals that one wishes to recognize. In isolated word recognition, 
the number of classes will depend on the “vocabulary” of the system and may 
be as large as 10,000. 

Many problems dealing with pattern recognition are found in the area 
of medicine as well. There are many applications that result in continuous 
functions, two-dimensional gray scale images, and time-varying images. These 
include results from electocardiograms, electroencephalograms, and X-ray im- 
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ages, to name a few. Cell analyzers classify blood cells in a population and 
determine cell type. Signal classification routines are of enormous importance 
in gathering fast information from these and other biological data. 

These are just some of the many real-world applications in which signal 
classification plays a very important role. This makes it necessary to develop 
routines which are capable of performing well in signal processing problems. It 
is in this light that we consider the problem of determining a structure suitable 
for classification. 
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2. Neural Networks 



It has long been recognized that the human brain functions in a com- 
pletely different way from the modem digital computer. There has been a great 
interest in studying how the human brain works and in determining whether it 
is feasible to design a model capable of solving problems in a similar manner. 
Ramon and Cajal in 1911 introduced the concept of neurons as the basic ele- 
ments of the brain [11]. It has been determined that neurons process information 
one hundred thousand to one million times slower than a basic silicon gate chip. 
The brain compensates for this slower speed by possessing in the neighborhood 
of 10 billion neurons and 60 billion synapses, or interconnections between the 
neurons [21]. As a result the brain is capable of performing many tasks at rates 
much greater than even the fastest computer. It is in an attempt to emulate 
this capability of the brain that the field of neural networks, or artificial neural 
networks, was born. 

The history of neural networks dates back to the 1940’s, when McCulloch 
and Pitts in 1943 proposed a computational model of an element resembling a 
neuron [3]. After some initial research, the idea faded until interest began to 
return in the 1980’s. Since then, the field of neural networks has grown rapidly, 
with interest from researchers in a number of fields ranging from engineering to 
physics to psychology. 

A neural network, essentially, is a structure that attempts to model the 
way the brain performs some task and then to perform that task in a similar 
manner. The structure may be electronically built or simulated in software, for 
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example. A neural network will contain a large number of individual cells, which 
model the neurons, and a number of interconnections between them, which 
model the synapses. Often the information passed through the interconnections 
will be multiplied by constants in order to achieve a certain task. This is known 
as weighting. Haykin gives a definition as adapted from Aleksander and Morton 
in 1990: 

A neural network is a massively parallel distributed processor that 
has a natural propensity for storing experimental knowledge and 
making it available for use. It resembles the brain in two respects: 

1. Knowledge is acquired by the network through a learning pro- 
cess. 

2. Interneuron connection strengths known as synaptic weights 
are used to store the knowledge. 

The learning process mentioned here is often an attempt to modify the 
interconnection weights in order to accomplish the designated task. This at- 
tempt compares with the well-known field of adaptive filter theory, where filter 
weights are adapted over time until they approach a steady-state value. 

There are many benefits that arise from neural networks’ inherent struc- 
ture. The following are some of them (see [11]). 

1. Nonlinearity. The functions performed by the neurons are nonlinear; 
therefore the entire network, which is a weighted connection of these neu- 
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rons, will also be nonlinear. This helps in modeling typical applications, 
which are often nonlinear. 

2. Input-output Mapping. One way in which the values for the weights used 
in the interconnections of the neural network are obtained is by a process 
called training. An example input is given, and weights are chosen so 
that the error between the actual output and some known desired output 
is minimized. This training procedure is repeated until the values of the 
weights reach a steady state (if possible). Thus the neural network learns 
by creating an input-output mapping. 

3. Adaptivity. A neural network has the property of adapting its synaptic 
weights in order to match a change in the surrounding environment. When 
it is operating in one environment, it may be retrained to operate in 
another environment which has only minimal changes. Further, a neural 
network operating in a nonstationary environment is able to adapt its 
weights in real time. 

4. Evidential Response. A neural network, when faced with a choice, is often 
able not only to select the right choice, but to give a confidence about the 
choice it made. For example, a neural network used for classification and 
given an input signal may output the class for that signal as well as how 
sure it is that that is actually the correct class. 

5. Fault Tolerance. Since each of the many neurons in a neural network 
stores an important bit of information, the network’s power is distributed 
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over each of these neurons. This allows the network in theory to continue 
operating even when one of the neurons fails, though with some degrada- 
tion in performance. Neural networks are thus often marked by a gradual 
decay in performance instead of a single catastrophic failure. 

6. Uniformality of Analysis and Design. Because all neural networks are sim- 
ilar in a structural sense and the same notation is used in the applications 
of neural networks to different problems, they are in a sense universal. 
This is seen in the following properties: 

• Neurons are common to all neural networks. 

• This commonality allows for the sharing of information between neu- 
ral networks in different applications. 

• It is possible to build modular networks easily simply by integrating 
the different modules. In other words, parts of different networks 
(or even entire networks) may be used easily in conjunction with one 
another to create a new network. 

As neurons are the building blocks of a neural network, their modeling is 
most important. The basic design for a neuron is fairly simple. A set of synapses 
are input to the neuron. These interconnections are weighted by real numbers, 
the synaptic weights. These weighted values are then summed. Finally, this sum 
is passed through a (typically) nonlinear activation function. This function 
usually serves to limit the output of the neuron to some desired range, for 
example [0, 1] or [—1,1]. An example of this model of a neuron is shown in 
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Figure 1: Nonlinear model of a neuron 



Figure 1. 

While the neurons themselves are modeled more or less the same regard- 
less of the application, there are different architectures for the actual network. 
We will be concerned with just one particular type, called a feed-forward net- 
work with one hidden layer. This network architecture consists of a large number 
of neurons arranged schematically in three layers. This may be seen in Figure 
2 . 

In theory, each unit of the input layer may be connected to each unit of 
the hidden layer. This connection has a weight, which as mentioned above is a 
real number, associated with it. The weights axe denoted by Wij. So each unit 
on the hidden layer receives a weighted sum of elements from the input layer 
and then processes this sum with an activation function. Finally, the result of 
this activation is transmitted to the output layer with another set of weights 
and then summed. The result for the network structure shown in Figure 2 is: 

Y.Oia^WijCij). 

2—1 j = \ 
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Figure 2: A feed-forward neural network 



Finally, it is important to note that it is not necessarily possible to 
solve any problem simply by constructing a neural network at random and then 
attempting to train the weights. It is important to determine when a solution 
will be possible and what structure of network to try. Later it will be shown that 
a certain type of neural network is capable of solving an important classification 
problem. 
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3. Background 



Metric Spaces 

A type of space that will play a particularly important role in the study 
of approximation is a metric space. They are described in detail in many books, 
for example [9], [13], and [18]. 

Definition : A metric space is a pair (A, p ) where A is a set of elements and 
p is a metric, or distance function, that is nonnegative and real-valued with the 
following properties: 

1. p(x, y) = 0 if and only if x = y; 

2. p(x,y) = p(y, x); 

3. p(x, y) + p(y, z) < p(x,z). 

Some examples of metric spaces are: 

Example 1: The set of real numbers with metric p(x, y) = \x — y\, referred to 
as 1R or JR 1 . 

Example 2: The set of all ordered n-tuples x = (xi, X 2 , ■ ■ ■ x n ), with metric 



Example 3: The set of continuous functions defined on a closed interval [a, 6] 
with metric p(f,g) = max \f(t) - 



p(x, y) = (xfc — yk) 2 - This space is generally referred to as lR n . 




a<t<b 



li 



Example 4: This same set of continuous functions along with the metric 

p(f,g) = (J a [ f(t ) - g(t)] 2 dt) l/2 

form a different yet equally valid metric space (known as L 2 (JR n ). Thus, the 
metric as well as the set of points must be known in order for the space to be 
completely determined. 

Let X be a metric space with xo € X and let r > 0. We define an open 
ball with radius r centered about xo (written b(x 0 ,r )) to be the set of points 
x E X such that p(x,x o) < r. Let A Cl We define a point x E A to be an 
interior point of the set A if b(x, r) C A for some r > 0. That is, we can find 
an open ball surrounding the point x such that every point in the ball belongs 
to the set A. It is in this way that we go about defining open sets in a metric 
space. In fact, a set A C X is called an open set if all of its points are interior 
points. 

Example 1: Consider the set (0,1) in JR. Given any point in the set, it is 
possible to choose an open ball of some radius such that the ball is contained 
in (0, 1). Therefore, (0, 1) is open in JR. 

Example 2: On the other hand, consider the set [0, 1) in JR and look at any 
open ball about the point 0 with radius r. Whatever the choice of r, there will 
be points contained in the ball that are not in [0,1) (for example, the point 
—rj 2); therefore the point 0 is not an interior point of the set [0, 1), Therefore 
the set is not open. 

Let X be a metric space and x €. X. We define a neighborhood of x as a 
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set containing an open set containing x. This open set will necessarily contain 
an open ball 6(xo,e) for some e > 0. Therefore, every neighborhood of a point 
will contain an open ball of that point. Again let X be a metric space and let 
A C X. A point x 6 X is called a contact point of A if every neighborhood of 
x contains at least one point in A. Obviously all x € A are contact points of 
A. If every neighborhood of x contains infinitely many points in A, then x is 
called a limit point of A. Note that a limit point is necessarily a contact point 
by definition. The closure of a set A, written as A, is simply the set of all the 
contact points of A. A set which is equivalent to its closure, (A = A) is known 
as a closed set. 

Example 1: Consider again the set [0,1) in IR. It is not possible to find an 
open ball about the point 1 that does not contain any points in [0, 1). Therefore 
every neighborhood of 1 contains at least one point (in fact, every neighborhood 
contains infinitely many points) in the set [0, 1). This implies that 1 is a contact 
point (and a limit point) of the set [0,1). Since 1 ^ [0,1), the set does not 
coincide with its closure (in fact, as expected, [0, 1) = [0, 1]) and is therefore 
not a closed set. 

Example 2: On the other hand, the set [0, 1] can be shown to be closed as its 
closure is the very same set [0, 1]. 

One of the most important concepts concerning metric spaces is that of 
continuity. Let (X, p x ) and ( Y , p y ) be metric spaces and let / be a function such 
that f : X —¥ Y. Then / is continuous at the point p E X if for every e > 0 
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there exists a S > 0 such that p y {f(x),f(p)) < t whenever p x (x,p) < 6. 

A sequence {x n } in a metric space X is said to converge if there is a point 
pel with the following property: For every e > 0 there is an integer N such 
that n > N implies that p{x n ,p) < e. We write this as x n — » p or Jim^a; n = p. 
We define {i n } to be a Cauchy sequence in a metric space X if for every t > 0, 
there exists a positive integer N such that \x n — x m \ < e for n, m > N. We can 
easily show that a sequence converges if and only if it is a Cauchy sequence. 
A metric space is said to be complete if every Cauchy sequence converges to a 
point in the space. The completeness of certain metric spaces is very important 
to proving results in those spaces. 

In a similar manner, we say that a sequence of functions {/ n } from X to 
1R converges uniformly on I to a function / if for every e > 0 there exists an 
integer N such that n> N implies | f n (x) — /(z)| < e for all x. We often write 
this as /„ — > / uniformly. For a discussion in greater depth of convergence, see 

[19]- 

Topological Spaces 

Although metric spaces are usually the most general space needed, there 
may be times when a result may be proved for a more general space. It is for 
this purpose that we now introduce the topological space. 

Definition : A topological space is the pair ( X , r) consisting of a set of points 
X and a topology r, where r is a family of subsets G C X, called open sets, 
with the following properties: 
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1. The set A itself and the empty set 0 belong to r. 



2. Arbitrary unions UG Q and finite intersections fj °f open sets belong 

a fc=l 

to r. 

The definitions of open and closed sets in a topological space X is quite 
simple. A set A C X is an open set if A belongs to r. A set B in a topological 
space A is a closed set if its complement X — B is open. 

We can also extend the concepts of a neighborhood, contact point, limit 
point, and closure of a set in a topological space. By a neighborhood of x, we 
mean any open set G containing x. A point a; € A is a contact point of T C A 
if every neighborhood of x contains at least one point in T. A point a; € A is a 
limit point of T C X if every neighborhood of x contains infinitely many points 
in T. Finally, the closure of a subset T of a topological space A is the set of all 
the contact points of T. 

Two important types of topological spaces are Hausdorff spaces and nor- 
mal spaces. A topological space A is called a Hausdorff space if: 

1. Sets consisting of single points are closed. 

2. For every pair of distinct points x and y in A, there are disjoint neigh- 
borhoods of x and y. 

A topological space is called a normal space if: 

1. Sets consisting of single points are closed. 
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2. For every pair of disjoint closed sets A and B, there are disjoint neighbor- 
hoods of A and B. 

Obviously, every normal set is Hausdorrf, though a Hausdorff set need 
not be normal. It can be verified that all metric spaces are topological spaces 
simply by taking r to be the family of open sets that axe open in the metric 
space in the usual sense. This is very important as it allows any result relating 
to topological spaces to be applied to metric spaces as well. In fact, we get an 
even better result: all metric spaces are normal (and therefore Hausdorff). The 
contrasts, however, to both of these statements are not true. 

Example : The topological space consisting of only two points (0, 1} where r 
consists only of the sets {0, 1} (the entire space) and 0 is not a metric space. 

Continuity in a topological space is a somewhat different concept than 
continuity in a metric space as well. Let (X, r x ) and (F, r y ) be two topological 
spaces and let / : X Y. Then / is continuous if f~ l (A) € r x for every A in 
T y . In other words, continuity implies that the inverse image of an open set is 
open. 

A family A4 of subsets M a of a topological space X is called a cover of 
X if X C \JM a . If the sets M a consist entirely of open sets, then we call the 

a 

family an open cover. A topological space is compact if every open cover has a 
finite subcover. 

Although metric spaces possess many of the nice properties that we 
would like to have for topological spaces, it is not true that all metric spaces 
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are compact. There are some theorems (see for example [14]), however, that 
allow us to determine whether a given metric space is compact without having 
to view it as a topological space. 

Let A and B be subsets of the metric space X. Then the set A is called 
an e-net for the set B if there exists a point x a € A such that for e > 0 any 
x E B, p{x,x a ) < e. 

Theorem 1 (Hausdorff). For compactness of a set M of a metric space X it 
is necessary that there should exist a finite e-net of the set M for every e > 0. 
If the space X is complete, then the condition is also sufficient. 

Roughly speaking, a set is compact if we can find a finite number of 
points and take open balls centered at those points such that the union of all 
the open balls contains the set. There are some improvements to this if we 
consider certain specific spaces. 

Example 1: (Heine-Borel). A subset of JR is compact if and only if it is closed 
and bounded. 

Example 2: (Arzela). The functions of a set A are said to be uniformly 
bounded if there exists a constant K such that |x(t)| < K for all x(t) 6 A. 
The same functions are equicontinuous if given e > 0, there exists a 6 > 0 such 
that |z(fi) — £(t 2 )| < e whenever \ti — f 2 | <6. A set A C C[0, 1], the space 
of real-valued continuous functions on the closed interval [0, 1], is compact if 
and only if A is closed and the functions x E A are uniformly bounded and 
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equicontinuous. 



Linear Spaces 

We now introduce the concept of a linear space. 

Definition : A nonempty set L is called a linear space if it satisfies the following 
axioms: 

1. Any two elements x E L, y E L uniquely determine a third element 
x + y e L called the sum of x and y that satisfies the following properties: 

(a) x + y = y + x (commutativity); 

(b) (x + y) + z = x + (y + z) (associativity); 

(c) L contains an element 0, called the zero element such that for all 
x G L, x + 0 = x; 

(d) For each x £ L, there exists an element —xEL such that z + (— x) = 
0, where 0 is the zero element; 

2. There exists a product operation such that any element x € L and any 
number a determine a unique element ax E L such that: 

(a) a(/3x ) = ( a/3)x 

(b) lx € L\ 

3. The operations of addition and multiplication obey the following distribu- 
tive axioms: 
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(a) (a + /3)x — ax + fix; 

(b) a(x + y) — ax + ay. 

The elements x, y, ... of a linear space are often called vectors, and the 
entire space is often called a vector space. The numbers a, /3, . . . are referred 
to as scalars and the entire set of allowable scalars is referred to as the field. 
Typically, the field is the set of real numbers, in which case the space is referred 
to as a real linear space. A subset Lo of a linear space L is referred to as a linear 
subspace of L if L 0 itself is a linear space over the same field as L. 

It is possible that a linear space possess no topology whatsoever as long 
as it satisfies the three properties above. However, in many applications the 
concepts of a linear space and topological space are combined. A space that 
is both a linear space and a topological space is referred to either as a linear 
topological space or a topological vector space. We require additionally only 
that the vector operations of addition and multiplication (which are not always 
the usual addition and multiplication) be continuous in the topology r. It is 
possible too to apply the concept of a metric to a linear space, but what is more 
useful is to define an operation a bit more specific than a metric, called a norm, 
and apply it to a linear space. 

Normed Linear Spaces 

Definition : A linear space L equipped with an operation called a norm (|| • ||) 
is called a normed linear space if || • || satisfies the following three properties: 
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1. ||x|| > 0 for all x where ||x|| = 0 if and only if x = 0; 

2. ||ax|| = |a| ||x|| for all x € L and all a; 

3. ||x + y || < ||x|| + ||y|| for all x and y in L. 

Just as every metric space is also a topological space, every normed linear 
space may also be considered a metric space (and therefore a topological space 
as well) by taking the metric to be: 

p{x,y) = ||x — y||- 

Again, the converse is not true. 

Example : The metric space consisting of the closed interval [0, 1] with the 
“discrete metric” p(x, y) = 1 if x ^ y and p(x , x) = 0 cannot be made into a 
normed linear space. 

A normed linear space that is complete (in the same sense that a metric 
space is complete) is known as a Banach space. 

One special Banach space is called a Hilbert space. 

Definition : A Hilbert space is a Banach space with the norm ||a:|| =< x, x > 1//2 
where < •, • > is an inner product with the following properties (assuming the 
space is real): 

1. < x,y > = <y,x > 

2. < a\Xi + a 2 x 2 , y > = a\ < x\,y > +a 2 < x 2 , y > 
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3. < x, x > >0 for all i^O. 



The most common example of a Hilbert space is the 7 i-dimensional space 
JR n , with the Euclidean norm ||x|| = \/Z!fc=i x \ where x = (xi,X 2 , • • • ,x n ). 

The Hahn-Banach Theorem and Separation in Linear Spaces 

One of the most important and fundamental results in all real analysis 
is the Hahn-Banach theorem. There are many different forms of the theorem 
and in most cases any version of the theorem can be used to directly prove 
any other version. It is first necessary to introduce the idea of convex sets and 
convex functionals. 

Definition: A set M C L is called a convex set if for each pair of points x, 
y € M, all points on the line segment joining x and y (that is, all points of the 
form kx + (1 + k)y, 0 < k < 1) are also elements of M. 

Definition : A functional p defined on a real linear space L is said to be convex 
if it has the following properties: 

1. p(ax) = ap(x) for all x e L and all a > 0; 

2. p(x + y) > p{x) +p(y) for all x, y E L. 

We now turn to the idea of extending a linear functional. Suppose we 
have a linear functional defined on a certain subspace. We want to know whether 
there exists a linear functional on the entire space that is equal to our first 
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functional on the subspace. The Hahn-Banach theorem tells us when this is 
possible. 

Theorem 2 (Hahn-Banach) Let p be a finite convex functional defined on a 
real linear space L and let Lq be any subspace of L. Let fo be any linear 
functional on L 0 satisfying the condition 

fo(x) < p(x) 

on L 0 . Then there exists a linear functional / on L, called the extension of fo 
such that / = fo at every point of L 0 and f(x ) < p(x) on L. 

Proof: We can assume that L 0 ^ L. Let z be any element of L — L 0 , and let 
L be the subspace generated by L 0 and the element z, this being the set of all 
linear combinations of the form x + tz (x 6 L 0 , t € 1R). For / to be an extension 
of fo onto L, we need 

f(x + tz) = f(x) + f(tz) = f 0 (x) + tf(z) 

Now, let c — f(z) and note that if / is an extension onto L then fo(x) + tc < 
p(x + tz). This condition can easily translate to the two conditions: 

c < p(x/t + z) — f 0 {x/t) if t > 0 and c > —p(—x/t — z) — fo(x/t) if t < 0 

So what remains is to show that there is always a c satisfying these conditions. 
In this light, let yi and y-z be elements of L 0 . Then 

/o(l/ 2 -yi) = /0G/2) - /o(yi) < P(V2 ~ Vi) 

= P((y 2 + z)~ (j/i + z)) < p(y 2 + z)+ p(-yi - z). 
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So we get 



~fo{y2) + p(y 2 + z) > —fo{yi) — p(~y i — z )- 

Now let Cl = sup[-/o(yi) - p(~yi - z)) and c 2 = inf t/ 2 [-/ 0 (t/ 2 ) + p (?/2 + z)]. 
y i 

Then c 2 > ci and it simply remains to choose c 2 > c > ci and note that c 
satisfies the necessary conditions. So the functional ft defined on L t satisfies 
the condition f(x ) < p(x) for x E L. An induction argument not given here 
proves the case when L is the entire space L. 

By applying the Hahn-Banach theorem, we may show a somewhat more 
useful result, given in [2], 

Theorem 3 Let / be a bounded linear functional defined on the subspace L of 
the real normed linear space X. Then, there exists a bounded linear functional 
F defined on the entire space X so that F(x) = /( x) for x E L and ||F|| = \\f\\. 1 

Proof: Since / is a bounded linear functional, then for x E L, \f{x)\ < ||/||||3:||- 
For x E X define p(x) = ||/||||a:||. It is then easy to show that p is convex and 
that f(x) < p{x). By the Hahn-Banach Theorem, extend / to a new functional 
F defined on all of X such that F{x) < p(x) = ||/||||2;|| and F(x) = f(x) for 
x E L. Clearly, F is bounded and ||F|| < ||/||. Similarly, if x E L, then |/(x)| = 
|F(x)| < ||F||||x||, implying ||/|| < ||F||. Combining the two inequalities, we see 
that ||F|| = ||/|| and the proof is complete. 

'The norm operator || • ||, when applied to a bounded linear functional on a normed linear 
space X (as is the case here) is defined as ||/|| — sup |/(ar)|. Further, ||/|| can easily be 

Nl<i 

shown to have the following properties: ||/|| = sup and |/(x)| < ||/||||a:|| for all x E X. 

i^O 11 11 
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We now turn to perhaps the most useful corollary of the Hahn-Banach 
theorem. It is very desirable in many situations to know that there are a suf- 
ficient number of bounded linear functionals defined on a space to strictly sep- 
arate the elements of that space. By strictly separate, we mean that for any 
two elements X\ and x 2 of a linear space X, there exists an / E X*, the set of 
bounded linear functionals on X , such that f(x i) — /(x 2 ) 7^ 0. We prove this 
in the context of the following theorem. 

Theorem 4 Let X be a normed linear space and x 0 E X , x 0 7 ^ 0. Then there 
exists an F E X* such that ||F|| = 1 and F(xo) = ||xo||- 

Proof: Let L be the linear subspace of X generated by taking the linear span 
of Xq. All elements in L will thus have a representation axo, a E M. Define 
the function / on L by /(ai 0 ) = a||xo||- It is seen at once that f(x 0 ) = ||x 0 || 
simply by taking a = 1. We can then extend / to a bounded linear functional 
F defined on the whole space X as noted in the previous theorem. Since F= f 
on L, F(x 0) = f(x 0) = ||x 0 ||. It thus remains only to show that ||F|| = 1. For 
any x E L, we see that 

|/(x)| = |/(ax 0 )| = MINI = IKoll = INI, 

implying that ||/|| = 1 and therefore 11^11 = 1 by the previous theorem. 

To prove our assertion about the strict separation of elements in a linear 
space by the functionals defined on that space, let A be a normed linear space 
and X\ and x 2 be distinct elements in X. Further, let f E X*. Now define 



24 



xq = x\ — x-i and see that xo ^ 0 since £1 and £2 are distinct. We inay now 
apply the previous theorem to get 



fix 1 - £ 2 ) = f{x 0 ) = 1 1 £ 0 1 1 7^ 0. 
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4. The Stone- Weierstrass Theorem and Uniform Approximation 



In many applications, it is desirable to know whether a certain class of 
IR-valued functions may be useful in uniformly approximating a larger group of 
^-valued functions. Weierstrass proved that it is possible to uniformly approx- 
imate any continuous functional on a compact subset of JR n by a polynomial in 
n variables. Since that time, there have been several different proofs of Weier- 
starass’ theorem. One of the most useful is the one given by M. H. Stone in 
[23]. His primary result, which will be shown, generalizes Weierstrass’ result in 
that it allows the domain to be any compact set (instead of just any compact 
subset of lR n ) and the set of approximating functions to be a set other than 
polynomials (which may not have meaning on a general compact set). 

In order to generalize the theorem, we can view the polynomials as a 
subset of the set from which we obtain the approximating functional. We seek to 
know what functions may be derived from a certain set of prescribed functions by 
the specified algebraic operations of addition, multiplication, multiplication by 
real numbers and uniform passage to the limit. The set of prescribed functions 
for the polynomials, for example, consists of just two functions: /i (x) = 1 
and / 2 (x) = x defined on a bounded closed interval X of 1R. Prom these two 
functions and the algebraic operations alone, the set of all polynomials may 
be formed. Weierstrass’ theorem then tells us that the uniform passage to the 
limit of this set (the polynomials) is the set of all continuous functionals on X. 
Equivalently, the set of continuous functionals is the uniform closure of the set of 
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polynomials, or the continuous functions on X may be uniformly approximated 
by the set of polynomials. 

In order to begin proving this generalized theorem, it is instructive to 
consider the case of a general topological space X where the specified algebraic 
operations are the lattice operations V and A defined to be: 

/ V g = max(/, g) and / A g = min(/, g ) 

These form the functions h and k defined as: 

h(x) = ma x(f(x),g(x)) and k(x) = min(/(x), g(x)) 

for any x E X. Let C be the set of all continuous real functions on X and 
Co be a prescribed subfamily of C. We want to obtain the family U (C 0 ) of all 
functions which can be formed from the functions in Co by the application of 
the specified algebraic operations and uniform passage to the limit. In the case 
of the lattice operations, it is easily observed that U(C 0 ) is a part of C closed 
under uniform passage to the limit, that is 

U (Co) C C, U(U{C 0 )) = U{Cq). 

The first property may be shown by observing that the mappings 

x — v inax(/(a:), g(x)) and x — > mm(f(x), g(x)) 

are continuous. This follows from the continuity of / and g (necessarily true 
since Co is a subfamily of C) and the continuity of the max and min mappings. 
Now since the uniform limit of continuous functions is also a continuous function, 
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clearly U(Co ) C C. To show that U(U(C 0 )) = U(Co), we can form U{Cq) in 
two steps. First, let Ui(C 0 ) be the set containing all the functions obtained by 
applying the lattice operations alone to the functions in C 0 . Then let U 2 (C 0 ) 
be the set consisting of the functions obtained from those in U\(C 0 ) by uniform 
passage to the limit. Clearly, 



Co C U x {Co) C U 2 (C 0 ) C C/(C 0 ). 

It remains to show that U 2 (C 0 ) is closed under the allowable operations, and 
therefore U 2 (Cq) = U{Co)- Let / be a function which is a uniform limit of 
functions /„ in U 2 (Cq). Then / must also be in U 2 (Co) since given e > 0 , 
there exists a function g n in Ui(C 0 ) such that \f n — g n \ < e/2 since U 2 (C 0 ) is, 
by definition, the functions obtained by passing those in U X (C 0 ) to a uniform 
limit. Also, |/ — f n \ < e/2 since our definition of / was a uniform limit of f n . 
Therefore, | / — < e and / is a uniform limit of functions g n in U\(Co) and 

therefore a member of U 2 (C 0 ) We must now show that whenever / and g are in 
U 2 {Cq)i then so are / V g and / A g. This can be done by observing that if / 
and g are uniform limits of functions /„ and g n in U\{Co ), then / V g and / A g 
are uniform limits of f n V g n and /„ A g n , respectively. 

Theorem 5 Let A be a compact space, C the family of all continuous real 
functions on X , Co an arbitrary subfamily of C, and U(Cq ) the family of all 
functions (necessarily continuous) generated from Co by the lattice operations 
and uniform passage to the limit. Then a necessary and sufficient condition for 
a function / in C to be in U(C 0 ) is that, whatever the points x, y E X and 
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whatever the positive number e, there exists a function f xy obtained by applying 
the lattice operations alone to Co and such that 

1/0*0 - fxy(x)\ < e and |/(y) - f xy (y) \ < e. 

Proof: The necessity is obvious. A proof of the sufficiency, which is not com- 
plicated, is given in [23]. There, Stone also notes the following corollary to the 
theorem. 

Corollary 1: If Co has the property that, whatever the points x, y € A, x ^ y 
and whatever the real numbers a and /?, there exists a function / 0 in Co for 
which f 0 (x ) = a and fo(y) = (3, then £/(C 0 ) = C. 

This tells us that the way in which a function / acts on pairs of points in 
A determines whether it can be approximated C/(C 0 ). This observation leads 
to the following theorem. 

Theorem 6 Let A" be a compact space, C the family of all continuous (neces- 
sarily bounded) real functions on A, Co an arbitrary subfamily of C and U(Cq) 
the family of all functions (necessarily continuous) generated from Co by the 
linear lattice operations and uniform passage to the limit. Then a necessary 
and sufficient condition for a function / in C to be in U(Cq) is that / satisfy 
every linear relation of the form ag(x) = /3g(y) , a/3 > 0, which is satisfied by 
all functions in Co. The linear relations associated with an arbitrary pair of 
points x, y in A must be equivalent to one of the following distinct types: 

1. g(x) = 0 and g(y) = 0; 
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2. g(x ) = 0 and g(y) unrestricted, or vice versa; 



3. g(x) = g(y) without restriction on the common value; 

4. g(x) = A g(y) or g(y) = Xg(x) for a unique value A, 0 < A < 1. 

Corollary 1: In order that U(Co ) contain a nonvanishing constant function, it 
is necessary and sufficient that the only linear relations of the form otg{x) = 
Pg(y), ot/3 > 0, satisfied by every function on Co be those reducible to the form 
g(x) = g(y). 

Proof: It is obvious that when U(Co) contains a nonvanishing constant func- 
tion then conditions (1), (2), and (4) can never be satisfied, so only (3) must be 
considered. 

Corollary 2: In order that U(C 0 ) = C, it is sufficient that the functions in X 0 
satisfy no linear relation of the form (l)-(4) of Theorem 1. 

This is an important corollary because in practice it is easy to consider 
a set of functions with the property that all functions do not satisfy all of the 
relations (l)-(4). 

Definition : A family of arbitrary functions on a domain X is said to be a 
separating family (for that domain) if, whenever X and y are distinct points 
of X, there is some function / in the family with distinct values f(x), f(y) at 
these points. 

Corollary 3: If X is compact and if Cq is a separating family for X and contains 
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a nonvanishing constant function, then U(Cq ) = C. 



Proof: Since Cq contains a nonvanishing constant function, it may satisfy only 
condition (3) of Theorem 2. However, since Cq is a separating family, there is 
a function / 6 C 0 such that f(x) ^ f(y) for x, y in X. So condition (3) is not 
satisfied by all functions in Cq. Therefore none of the conditions are satisfied 
by C 0 and therefore U(Co) = C. 

We now consider the case where U{Cq) is built from the functions in 
Cq C C using the operations of addition, multiplication, multiplication by real 
numbers (the linear ring operations), and uniform passage to the limit. If / and 
g are uniform limits of the sequences /„ and g n respectively, the product fg is 
not in general the uniform limit of the sequence /„<?„. We therefore require that 
the set C consist of the bounded continuous functions on X. Of course, this is 
satisfied automatically when X is compact. This leads to the general theorem. 

Theorem 7 Let X be a compact space, C the family of all continuous (nec- 
essarily bounded) functions on X, C 0 an arbitrary subfamily of C and U(C 0 ) 
the family of all functions generated from Co by the linear ring operations and 
uniform passage to the limit. Then a necessary and sufficient condition for a 
function / in C to be in U(C 0 ) is that / satisfy every linear operation of the 
form g(x) = 0 or g(x) = g(y) which is satisfied by all functions in X 0 . 

Proof: As a lemma, one can show (see [23]) that if / is in U{Cq) then so is 
|/|. This means that / is the uniform limit of functions in Cq subject to the 
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linear ring operations. Using a well known representation of the min and max 
functions: 

max(o, b ) = i(o + b + \a — b\) 

£ 

min(a, b) = ^-(a + b — |a — £>|) 

we can now see that whenever / and g are in U(Cq) then / V g and / A g 
are in U(Cq ) as well. So U(C 0 ) is closed under the linear lattice operations as 
well as the linear ring operations and uniform passage to the limit. Therefore 
the results in Theorem 2 are applicable here. It remains to show that every 
function in U(X 0 ) cannot satisfy linear relations of the form given in condition 
(4) of Theorem 2. Assume g(x) = A g(y) for every function g in U(Co) and every 
x, y in X, for 0 < A < 1. Then for every / in U(Cq), f 2 is also in U(Cq) and the 
relations f 2 (x) = A f 2 {y) and A f 2 (y) = A 2 f 2 {y) would hold, implying that either 
f{y) — 0 for every / in U(C 0 ) or A = 0, 1, the second being a contradiction to 
the assumption. So we conclude that / is in U(C 0 ) if and only if it satisfies all 
relations of the form g(x) = 0 or g(x) = g(y) satisfied by those functions in Cq. 



We give a definition in order to restate the general theorem. 

Definition : A family A of real functions defined on a set X is said to be an 
algebra if (i) / + g € A, (ii) fg 6 A, and (iii) cf € A for all / 6 A, g € A and 
for all real constants c, that is, if A is closed under addition, multiplication, and 
multiplication by real numbers. 

An equivalent form of the general theorem that is often used in practice 
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is stated in [19] as follows: 

Theorem 8 Let A be an algebra of real continuous functions on a compact set 
K. If A separates points of K and if A does not vanish at any point in K, then 
any real continuous function on K may be approximated by an element of A. 

An argument in [4] extends the theorem to certain normed linear spaces that 
are not necessarily compact. 

Theorem 9 Let AT be a normed linear space (or, indeed, any Hausdorff topo- 
logical space). If A is a subalgebra of C(X), the continuous functions on X, 
that contains constants and separates the points of X , then A is dense in C(X). 

Proof: Let / be any element of C(X). We must prove that each neighborhood 
of / contains an element of A. Let A' be a compact set in X and e a positive 
number. By restricting / and all members of A to the compact set K , we 
can apply the classical version of the Stone- Weierstrass Theorem in C(K). Its 
conclusion is that the set 

{g\K . geA} 

is dense in C(K). Hence there is an element g in A such that |j/ — g\\ K < e. 

Now we give some examples from Stone’s original article. 

Theorem 10 Let X be an arbitrary bounded closed subset of n-dimensional 
Cartesian space, the coordinates of a general point being xi, . . . ,x n . Any con- 
tinuous real function / defined on X can be uniformly approximated by polyno- 
mials in the variables Xi, . . . ,x n . In case the origin x = (0, . . . , 0), the function 
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/ can be uniformly approximated by polynomials vanishing at the origin if and 
only if / itself vanishes at the origin. Otherwise / can be uniformly approxi- 
mated by such polynomials without qualification. 

This is the classical approximation theorem proved by Weierstrass. 

Theorem 11 Let / be an arbitrary continuous real function of the real variable 
9, 0 < 9 < 2tt, subject to the periodicity condition /( 0) = /( 27 t). Then / 
can be uniformly approximated on its domain of definition by trigonometric 
polynomials of the form 

o 0 N 

p(9) = — + ^2 ( a n cos n Q + b n sin n9 ). 

^ n = 1 

Theorem 12 Any continuous real function /, which is defined on the interval 
0 < x < oo and vanishes at infinity in the sense that J^rr^/(:r) = 0, can be 
approximated by functions of the form e~ ax p(x) where p(x) is a polynomial. 

Theorem 13 Any continuous real function / which is defined on the interval 
— oo < x < +00 and which vanishes at infinity in the sense that 

lim f(x ) = lim f(x ) = 0 

X-+-0O ' x->+oo ' 

can be uniformly approximated by functions of the form e~ a2x2 p(x ) where p(x) 
is a polynomial. 

Several of these examples will prove useful shortly. 
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5. Neural Network Approximation of Continuous Maps 



We now will examine a structure that has been proven in useful for ap- 
proximation. The structure will be based almost entirely on a proof in [20]. We 
assume that we have a normed linear space X and a subset C that is nonempty 
and compact. We let X* represent the set of bounded linear functionals on X 
and Y represent a set of continuous maps which are dense in X* on C in the 
usual sense. That is, for each </> 6 X* and for some e > 0, there exists a y £ Y 
such that | (f)(x) — y(x) | < e for x € C. Further, for k — 1, 2, 3, ... we let D k be 
any family of continuous maps h : M k M such that given a compact E C IR k 
and any continuous g : E Mas well as a > 0 there exists an h e D k such that 
| g(x) — /i(a;)| < a for x E E. Let U be any set of continuous maps U : M*-> M 
such that given a > 0 and any bounded interval (/?i,/? 2 ) C M there exists a 
finite number of elements oiU for which | exp(/?) — Ylj u j(P ) I < 0 f° r 

P e (A, AO- 

Theorem 14 (Sandberg) Let / : C i-> M. Then the following conditions are 
equivalent. 

(i) / is continuous. 

(ii) Given e > 0 there are a positive integer k , real numbers Ci, . . . , c k , elements 
U\, . . . , u k of U, and elements yi, ... ,y k of Y such that 

1/0*0 - I Zw [%■(*)] I < e 

j 

for x G C. 
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(in) Given e > 0 there are a positive integer k, elements yi, ■ • • ,Vk of Y, and 
an h E D k such that 

\f(x) - h[y l (x),...,y k (x)]\ < e 

for x E C. 

Proof: First, assume condition (i) holds. Let V be the set of all functions 
v : C ■-» JR such that 

v (x) = 5Zexp(^(®)), 

in which the sum is finite and a,j E JR and <f>j E X*. To see that V constitutes 
an algebra as defined above, observe that 

exp(^>(x)) exp(i/’(x)) = exp(<^(a:) + if)(x)) = exp(^> + il>)(x). 

Taking <f> — 0 we can see that V contains constants. Finally, we have demon- 
strated previously that the Hahn-Banach theorem guarantees that we can choose 
an x and y in C such that <j>(x — y) ^ 0. Therefore, exp(^>(x)) ^ exp (<f>(y)), so 
V separates the points of C. We may now apply the Stone-Weierstrass theorem 
guaranteeing uniform approximation on compacta. In other words, for e > 0, 
there are a positive integer n, real numbers d\, . . . , d n , and elements Zi, ... ,z n 
of X* such that 

|/(x) - £dj-exp(*j(x))| < e / 3 

j=i 

for x G C. 

Assume that \dj\ ^ 0. Choose 7 > 0 such that 7 \dj\ < e/3. Let 

[a ,b'] be an interval in JR that contains all of the sets Zj(C), and let a E JR 
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and b G 1R such that a < a and b > b' . That is, the interval [a, b] contains the 
interval [a , b']. Now, choose v > 0 such that | exp(/?i) — exp(/32)| < 7 for (3\, 
(5 2 G [a, b] with \Pi — fcl < v. Clearly this is possible because of the continuity 
of the exponential function. Set p = min(^, a —a, b — b ) and choose yj G Y such 
that \zj(x)— Vj(x)\ < p, x G C for all j. This gives | exp(^j(a:)) — exp(^j(x))| < 7, 
x G C for each j. Now using a version of the triangle inequality, this gives: 

l/(*)-5Z ex p(%( x ))l ^ l/ 0»0 -S ex p( z i( x ))l 

3 j 

+ lS ex P( 2 j( x )) - S ex P(%( x ))l 

3 3 

< e/3 + Y, Mill exp(^(x)) - exp(^(a:))| 

3 

< 2e/3, 

for iGC. 

Now we choose u l5 . . . , ui G U so that 

|exp(/3) ~J2 u i(P)\ < 7 i»^€ [a,b\ 

i 

where 71 Mil < c/3. Then, 

I f( x ) -Y,H d 3 u i[yj{ x )\\ < 1 / 0*0 - Y, d i ex p[yj( x )}\ + IX^ rf i ex PM x )] 

3 * 3 3 

-Y.Y, d 3 u i[yj( x )]\ < (2c)/3 + 5ZMi ex Pbj( x )] - djYMy ^)] I 

3 i j i 

< (2c)/3 + Y Mill exp[%(x)] - $Z u ibi( x )]l < ( 2e )/ 3 + 7i £ Mil < e - 

3 i j 

Now, since J2jJ2idjUi[yj(x )] is equivalent to Ylj c j u j[yj( x )], with the Cj, 
uj, and yj in 1R, U, and Y, respectively, we have shown that (z) —¥ (ii). 
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X 



Approximation 
to f(x) 




Figure 3: A general structure for approximation 



To show that (ii) -¥ (in), let e > 0 and suppose that there exist k, 
Ci, . . . , c*, and tti, . . . , u* such that 

I f(x) ~ E c i u ifeW]l < e / 2 ,z e C ■ 

3 

Let h e D k satisfy \h(X) — c j u j(ty\ < e /2 for A 6 [a, &] fc . Then 
\f(x) - h([y!(x), . . . ,y k (x)]\ < \f(x) - $3 CjUj[yj(x)]\+ 

3 

\'Z, c i u Ay j (x)] - h(yi(x), y fc (s)]| < e/2 + e/2 = e 

3 

for x £ C. 

Finally, (in) — » (i) as / is a uniform limit of continuous functions and 
therefore continuous itself. 

This proof has demonstrated a general structure that may be used for 
approximation. This structure is shown below in Figure 3. 

Part (Hi) of the theorem shows that the yj’s are simply functions which 
are capable of approximating linear functionals defined on the space X (these 
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may actually be linear functionals themselves) while the structure for h is simply 
a continuous memoryless nonlinear system capable of approximating uniformly 
on compacta in lR k . In other words, the problem of approximating a function 
whose domain may be any compact subset of any normed linear space has been 
reduced to the problem of approximating a function on lR k , a subject about 
which a great deal is known, and has been shown to some extent in dealing 
with the Stone- Weierstrass theorem. Stiles, Sandberg, and Ghosh have shown 
in [22] that structures of a similar form have use in the approximation of certain 
nonlinear discrete time mappings as well. 

Part (ii) of the theorem gives a specific example of the structure of 
the network. Again it takes the y/s to be uniform approximations of linear 
functionals on X. Here one possible structure for h is shown as below in Figure 
4. The u/s, as mentioned before, are drawn from a set capable of uniform 
approximation of the exponential function on a bounded set in M. In the 
simplest case, from the perspective of the theorem, each Uj may be taken to be 
the function exp(-). 

In a moment we will determine possible choices for the elements Uj in 
the approximation network. Now we will look at a similar method of dealing 
with this problem given in [4], [7], and [24]. We start be defining a certain class 
of functions, called ridge functions and then immediately give the theorem. 

Definition : A function / : X i-> M is called a ridge function if it may be 
represented in the form / = g o </>, where g : HR, HR and (f) 6 X*, where X* is 
the space of continuous linear functionals on X. An alternative equivalent form 
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Figure 4: A structure for h 



of this composite function is f(x) — g(<j>(x)) for x € X. 

It can easily be shown, for example, that all ridge functons on JR” can 
be written in the form 



fi x ) = 9( a 1 Cl + °2C2 + h GnCn) 



where x = (Ci, Ca> • • • > Cn) € R n . 



Theorem 15 (Cheney) Let G be a fundamental set in C'(JR) 2 and let A be a 
normed linear space. Let $ be a subset of X* such that the set 



<j)/\\(j)\\ : 0 G # 0 



is dense in the unit sphere of X*. Then the set of ridge functions {g ° (j> ■ g € 
G, (j) € <$*} is fundamental in C(X). 3 

2 A subset Y of A is said to be fundamental in X if its linear span is dense in X. Thus, 

n 

there are elements yi,...,y n €Y such that for any x € X and e > 0, \x - £ c jVj I < 6 where 

j= i 

Cj € JR. 

3 C{X) is, of course, the set of continuous, real- valued functions on the normed linear space 
X. 
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Proof: Let / be a member of C{X), C a compact set in X, and e > 0. We 
have shown above that there exist Uj E C(JR ) and yj E X* such that 

ll/(*) ~ Y, u i°yj\\ < e / 3 

j=i 

for x E C. By adjusting the functions uj as necessary, we can assume that 
||2/j|| = 1 for 1 < j < m. Let M = sup l€C ||x||. Choose 5 > 0 so that when 
|s| < M, |f| < M, and |s — t\ < 5 we get |uy(s) — «y(f)| < e/3 P for 1 < j < P. 
This is, of course, possible because the uj are continuous. Now select 4>j E $ so 
that ||0j/||0j|| — Vj\\ < 5/M for 1 < j < P. Let Ay = l/||</>y|| and // = maxy ||0y||. 
Select cijk G JR and gjk E G so that for |Tj < /xM we have 

N 

I Mj(Ajt) - Y a jk9jk(t)\ < e/3 P (1 <j<P)- 

k=l 

Now let x E C. Then ||x|| < M, |t/y(:c)| < M, \Xj<f>j(x) \ < M, and 
I Vj(x) ~ Ay0y(x)| < N|||yy||||yy - Ay^y|| < M(S/M) = 5. 

Prom the definition of 5 (i.e., let s = yj(x) and t = A j<f>j{x)) we get 

I Y h AvA x )) - J2 h iMj( x )) I ^ J2 e / 3P = e / 3 - 

j = i i=i j = i 

Now, because \(f>j{x)\ < ||0y||||x|| < y,M, the definition of ay* and gj k gives 

I Y - 5Z 5Z a jk9jk(M x )) I < Y e / 3P = e / 3 - 

j=l j= 1 k = 1 j = 1 

Now, by a simple application of the triangle inequality, we get 

I f( x ) o, jk g jk (<t>{x))\ < | f(x) - Y h j(Vj( x ))\ 

j= 1 k = 1 j—l 
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+1 Y - Y hj(Xj(x))\ + | Y hjMjix)) -YU a jk9jk(M x ))\ < e - 

j= i j= i j=i i=i fc=i 

p at 5 

Since X! X ajk9jk(<l>j(x)) may be written as X c j9j(4 , j( x ))> we get the desired 

j=i fc = i j = i 

result: 

s 

I /(*) - < e for x 6 C. 

j=i 

We note many similarities between this proof and part of Sandberg’s. 
The set of functions G in Cheney’s theorem is similar to the set of functions U 
in Sandberg’s, but the requirement in Sandberg’s theorem on U is less stringent. 
The set U is required only to approximate one specific function in C(JR), namely 
the exponential function, exp(/3), on a certain bounded set. Cheney’s theorem, 
on the other hand, requires that the set G be fundamental in C(1R). This means 
that any continuous function defined on a compact set in 1R is capable of being 
approximated by the set G. 
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6. Approximation and Classification 



As previously mentioned, the problem of classifying signals plays an im- 
portant role in a variety of problems. We attempt to provide the framework 
for a solution to some of these problems by restating the problem in a more 
mathematical sense. 

We assume first that all of the signals to be classified are drawn from 
a normed linear space. For simplicity, we will further assume that each signal 
may belong only to one of the classes. For example, assume that there are n 
different classes Ci, . . . , C n that are all subsets of a normed linear space X, and 
that each signal received must necessarily belong to exactly one of the classes. 

We now have the framework whereby we can view the classifier as a 
mathematical function / that takes the signal to be classified as input and 
produces the desired class as output. For example, if x € Cj, then f(x ) = aj, 
where ai,...,a n are all distinct integers, would model a classification system 
whereby each element of class Cj be mapped to the integer aj. A graph of this 
simple function is shown in Figure 5. Our assumption that each signal may 
belong to only one class means that the sets Cj are pairwise disjoint. 

In order to apply the theorems that we have developed, it is helpful to 
assume that the sets Cj are compact. This assumption will, of course, exclude 
certain classification problems from the scope of these theorems. We now can 

n 

let C — U Cj. The set C will now also be compact as it is the union of a finite 
j=i 

number of compact disjoint sets. Finally, since the function / is constant on 
each set Cj and the distance between any pair of sets is positive, the function 
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Figure 5: Representation of a classifying function 



/ is continuous. With these assumptions, we get the following: 



1. There are real numbers elements y\,...,y k £ F, a positive 

integer n, elements tt x , . . . , u n oiU and e > 0 such that 

k 

aj-e<J2 c 3 u AyA x )} < a i + e 

3 = 1 

for x G Cj and j = 1, . . . , m. 

2. There are a positive integer k, elements yi,. ■ ■ ,y k of Y and an h G D k 
such that 

aj-e< h[y\ (x ), . . . , y k (x)\ < aj + t 
for x G Cj and j = 1, . . . ,m. 

These follow directly from Sandberg’s theorem. 



44 




Figure 6: A classifying network 

This now allows us to use the above approximation network for the pur- 
pose of classification. We require one additional element and that is a quantizer 
Q. This quantizer is simply a real functional Q : 1Ri-> 1R such that Q maps num- 
bers in the interval (aj — e, aj+e ) to a.j. As long as we choose e < 0.5 min |a* — a,j\, 
then this quantizer, when following a network of the structure defined above, 
will allow the correct class to be output. This gives an entire structure for a 
classification network. It is shown in Figure 6. The structure for h as defined 
in part ( ii ) of Sandberg’s theorem is used in the figure. 

We now turn to demonstrating some acceptable choices for the hidden 
elements in our classification network. In all cases, the complete structure of 
the network is as in Figure 6. No assumption is made about the number n (how 
many elements are necessary) or the determination of the constants Cj. We are 
concerned entirely with determining suitable choices for the Uj and give several 
examples as well as a justification for each here. In each case, the yj will be 
assumed to be either bounded linear functionals on X or elements capable of 
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uniformly approximating them. 



Polynomial Networks 

A polynomial network is simply one in which each Uj is a polynomial. 
In the ridge function form, a polynomial network will be of the form 

°^- = EE CiAM x )Y- 

i i j 

The original Weierstrass Approximation theorem showed that polynomi- 
als were capable of approximating on M. Now, either Theorem 14 or Theorem 
15 tells us that polynomials, when placed in the network, are capable of solving 
the classification problem. 

Exponential Networks 

An exponential network in which each of the elements Uj is of the form 
exp(-) is the most basic to justify as the proof of Sandberg’s theorem is based 
on showing first how the exponential functional is capable of being used as 
the nonlinear element and then showing how a function capable of uniformly 
approximating it on a bounded interval is also acceptable. 

Continuous Sigmoidal Networks 

A more complicated but extremely important type of network that is 
useful for classification is a continuous sigmoidal network. It is first necessary 
to define a sigmoidal function. 

Definition : A functional a : M t-> 1R is called a sigmoidal function or sigmoid 
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if 



lim a(t) = 0 and lim alt) = 1. 

t — ► — oo v ' t->oo ' ' 



In 1989, Cybenko (see [8]) proved that for any compact set C C JR n , any 
/ 6 C(C), and for any e > 0 there exists a function g of the form 

m 

g{x) = Y, °j < 7 ( < Ti , x > +0») (z, 7 j e -^ n , % e 

i=i 

where cr is a continuous sigmoidal function such that 

| g(x) — f(x)\ < e for all x € C. 

In other words, this sum of translations and dilations of a sigmoidal func- 
tion is capable of uniformly approximating any bounded continuous functional 
on a compact subset of IR n . Sandberg mentions in [20] that given that the 
statement is true for n = 1, the (i) —> ( ii ) section of his proof quickly extends 
the result for n > 1. Indeed, if we let X be simply M n , the elements yj be linear 
functionals defined on ]R n , and Uj(x) = Cjo(a.jX+ (3j) where Cj, ctj, /3j £ IR. This 
gives us a sum of the type desired for n > 1. 

In [5], Cheney demonstrates as a result of the general theory of ridge 
functions that the result is applicable when the elements of the vectors 7 j and 
the numbers 6j are integers. In fact, the theorem is given as follows. 

Theorem 16 Let g be a continuous function on IR such that the limits of g(t) 
as t — > 00 and t — > —00 exist and are different. Put gij = g(jt + i). Then 
{9ij ’■ h3 £ -2} is fundamental in C(IR). 
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The proof of this theorem relies on measure theory, making use of the 
Riesz Representation Theorem and the Dominated Convergence Theorem. It is 
beyond the scope of this thesis but can be found in [4]. 

It is seen that this theorem allows g to be a continuous sigmoid, but does 
not require it. The only importance when using the translations and dilations is 
that the limits at oo and at — oo are not the same. It was mentioned earlier that 
often times it is desired that the output of the activation function in a neural 
network be in a certain range such as [0,1]. Sigmoidal functions fit nicely into 
this framework. 

Finally, we can show at once that these shifted and scaled sigmoidal func- 
tions are capable of approximating on any normed linear space by using either 
of the two main theorems after noting that they are capable of approximating 
on IR. 

Squashing Function Networks 

The previous section has dealt with the use of translations and dilations 
of continuous sigmoidal functions. In this section, we will deal with certain type 
of sigmoid that is not necessarily continuous, a squashing function, and attempt 
to obtain a similar result. A squashing function is defined in [12] as follows: 

Definition: A function 4/ : IR [0, l] is a squashing function if it is nonde- 
creasing, lim 4/ (A) = 1, and lim 'k(A) = 0. 

A— *oo A— >— oo 

It is seen at once that this definition simply requires that ^ be a nonde- 
creasing sigmoidal function (not necessarily continuous). Some useful squashing 
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functions include the threshold function, 'I' (A) = 1{a>o} where 1{.} is the indi- 
cator function; the ramp function, \k(A) = A1{o<a<i} + 1 {a>i} 5 an d th e cosine 
squasher (see [10]), 4>(A) = (1 + cos[A + 37r/2])(l/2)l{_ p ;/2<A<7r/2} + 1{a>tt/2} - 
Hornik et al. first define what they call a sigma-pi network and prove 
certain results pertaining to it. Following this, they extend the results to a 
network resembling those that have been mentioned above. We proceed as did 
he, considering only the 1R 1 case. 

Definition : For any measurable function G mapping 1R to 1R, let XI II be 
the class of functions 

</ h 

{/ : lRt-> 1R : f(x) = IT G {Ajk(x)), £ 1R, Ajk £ A,q = 1, 2, . . .}. 

j=l k= 1 

where lj £ ]N and A is the set of all affine functions from 1R to M, that is, the 
set of all functions of the form A(x) = wx + b where w, b £ 1R. Networks of 
this form are referred to as sigma-pi networks. 

Definition: For any measurable function G mapping 1R to 1R, let XI 1 (G) be 
the class of functions 

{f :1R^ JR: f(x) = Y,Pj G (Aj(x)), x,/3j £ 1R, Aj £ A,q= 1,2, . . .}. 

j = i 



This form of this second network clearly resembles the continuous sig- 
moidal network that was shown above if G is taken to be a continuous sigmoidal 
function. The shifting and scaling that was present above is simply performed 
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by the affine functional here; only the notation is different. For now, we will 
continue to let G be any function. 

We now give the main result that applies here. 

Theorem 17 For every squashing function \l/, is uniformly dense on 

compacta in C(1R). 

Proof: We proceed by first proving several lemmas that will aid in the proof. 

Lemma 1: Let G : 1R ■-> 1R be continuous and nonconstant. Then is 

uniformly dense on compacta in C(1R). 

Proof: We can apply the Stone- Weierstrass Theorem here. Let C C 1R be 
any compact set. For any G, £ EI^GO is obviously an algebra on C. If x, 
y G C , x / y, then we can find an A\ 6 A such that G(Ai(x)) ^ G(Ai(y)). 
To show this, pick o, b € 1R, a ^ b such that G(a) ^ G(b). Then choose 
to satisfy .Ai(x) = a and Ai(y ) = b. Then G^^x)) ^ G(Ai(y)). This ensures 
that £ 11(G) is separating. Now we must show that ZUl 1 ^) vanishes on no 
point of G. Pick b € M such that G(b) ^ 0 and A 2 (x) = 0 ■ x + b. For all 
x € C, G(^ 2 (x)) = G(6) ^ 0, so this is a nonvanishing constant function. The 
Stone- Weierstrass theorem now guarantees that f] 1 (G) is capable of uniformly 
approximating any continuous functional on G. 

This lemma shows that the sigma-pi networks are capable of uniform 
approximation of any continuous function on a compact set regardless of the G 
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with the only requirements that G be continuous and nonconstant. We have 
not yet required that G be a squashing function. 

Lemma 2: Let F be a continuous squashing function and ^ be an arbitrary 

squashing function. For every e > 0 there is an element H e of XZ('I') such that 

sup \F(X) - H e { A)| < e. 

Ae2R 

Proof: Choose e > 0 and assume without loss of generality that e < 1. We 
must now find constants (3j and affine functions Aj, j £ { 1 , 2 , . . . , Q — 1} such 
that 

sup \F(X) - ^2 < e- 

AeiR j— i 

Choose Q such that 1/Q < e/2. For j £ {1,2 ,Q — 1}, set /3j = 1/Q. Pick 
M > 0 such that ^(— M) < e/2 Q and \k(M) > 1 — e/2 Q. Such an M can 
be found because ^ is a squashing function. For j £ {1,2 — 1}, set 
rj = sup{A : F( A) = j/Q}. Set tq = sup{A : F( A) = 1 — 1/2 Q}. Because F is a 
continuous squashing function, such r/s exist. Now, for any r < s, let A TjS £ A 
be the unique affine function satisfying Ar.sM = M and >lr,s( s ) = —M. The 

Q-i 

desired approximation is then H f = J2 f3j'b(A r}trj+l (A)). We can easily check 

j=i J ’ 1 

that on the intervals (-oo, ri], (ri,r 2 ], . . . , (rQ_i, r Q ], (r Q , oo), \F(X) - H € (X)\ < 
e. 



Lemma 3: For every squashing function every e > 0, and every M > 0 there 
is a function cos^/.e € such that 

sup | cosM,f(A) — cos (A) | < e. 
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Proof: Let F be the cosine squasher previously defined. By adding, subtract- 
ing, and scaling a finite number of affinely shifted versions of F, we can get the 
cosine function on any interval [— M, M). Since F is continuous, we may apply 
Lemma 2 and the triangle inequality to easily obtain the result. Indeed, let G 
be an element of X) 1 ^)- We then have on the interval [— M, M ], 

| G (A) - cos (A) | < \G(X)~ F{X)\ + \F(X)-cos(X)\ 

= |G(A)-F(A)| + 0 

< t 

where the last line followed from Lemma 2. 

Q 

Lemma 4: Let g(-) = X) Pj cos(A,(-)), -4? e A. For arbitrary squashing func- 

j= i 

tion arbitrary compact C C M, and for arbitrary e > 0, there is an / 6 X) 1 ^) 
such that sup l€C \g(x) — f(x)\ < e. 

Proof: Pick M > 0 such that for j € {1,2, ... ,Q}, Aj(C) C [— M,M ]. Be- 
cause Q is finite, C is compact and the A(-) are continuous, such an M can 

Q 

be found. Let Q i = Q • X) \Pj\- From Lemma 3, for all x € C we have 

j=i 

Q 

| X) Pj cosM,e(Aj(x)) — <7(z)| < e. Because cos m,«/q £ X) 1 ^)) we see that 
j - 1 

/(■) = E%, COS € Ei«- 

Now we turn to proving the theorem. By Lemma 1, the trigonometric 

Q Ij 

polynomials { X) Pj H cos(Aj fe (-)) : Q, lj € JN,(3j € M,Aj k € A } are uniformly 

j = 1 k=i 
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dense on compact sets in C(M). By repeated application of the trigonometric 
identity cos(a)cos(b) = cos(a+b)-cos(a-b), we may write every trigonometric 

T 

polynomial in the form £ a t cos(A t (-)) where a t € 1R and A t £ A. The desired 

t= 1 

result now follows from Lemma 4. 

This now gives us another class of acceptable functions for the Uj in 
Figure 6, and choosing a squashing function will ensure that the output of each 
Uj is always between 0 and 1. 

Radial Basis Function Networks 

An important type of function that may be used in some classifying 
networks is the radial basis function, and more specifically, the Gaussian basis 
function. While we cannot generalize that in all cases a basis function network 
may be used for uniform approximation, there are some examples that are 
useful. Information about the universal approximation capability of radial basis 
function networks may be found in [17]. We define a radial basis function as a 
function which depends only on the norm of the argument. In other words, if f 
is a radial basis function and INI = IMI, then f(x) = f{y). 

We now give an example of a case when uniform approximation is pos- 
sible using a radial basis function network. In this particular instance the basis 
functions are Gaussian, functions that have other useful properties for approx- 
imation networks. Let If be a Hilbert space with inner product < •, • > and 
norm || • || defined in the usual way. We are interested mainly in H — M n 
with 11*11 = E j 4 Let C C H be compact and let V C H be nonempty, con- 
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vex, and satisfy the condition that for xi, X2 € C with x\ / x 2 there exists 
u G V such that ||xi — u|| 7^ ||x 2 — u||. We can, for example, take V to be C 
as long as C is convex, or we can take V to be any nonempty convex subset 
of H containing an interior point. Let P be a nonempty subset of (0, 00) or 
(—oo,0) that is closed under addition. Finally, let L = {g : C ■-» 1 R : g(x ) = 

m 

Y, aj exp(-aj\\x - Vj\\ 2 ),m < oo ,aj G R,aj G P,vj G V. It is immediately 
j = 1 

seen that the structure of L is of the form needed for the elements Uj in Figure 
6. With these assumptions we get the following theorem. 

Theorem 18 Let / : C h* IR be continuous and let e > 0. Then there exists a 
g € L such that 

\f(a)~ g(a)\ <e,ae C. 

Proof: Using the property above and the convexity of V, we see that given a lt 
a 2 € 1 R , a'i, a 2 € P, and t>i, v 2 £ V 

a x exp(-a 1 ||x - u 1 || 2 )a 2 exp(-a 2 ||x - v 2 \\ 2 ) = 6exp(-(ai + a 2 )||x - iy|| 2 ) 

for some b £ M and w € V. Also we can see that a\ + a 2 G P. So L is an 
algebra. Choose x\ and x 2 in C and assume that x\ 7^ x 2 . Then ||xi — v\\ ^ 
\\x 2 — u || for some v G V by our first assumption.. Therefore exp(— a||xi — v||) 7 ^ 
exp(— a||x 2 — ^11) so L separates the points of C. Therefore, by the Stone- 
Weierstrass theorem, the proof is complete. 

Thus, in this somewhat less general compact space, the Gaussian basis 
functions are capable of uniformly approximating any continuous function in 



54 



1 R. They therefore may be used as the elements uj in our original network. 
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7. Applications 



Classifier Example 

At this point we are ready to give an example of an actual classification 
network using the framework that we have provided. This example will also 
show how the mathematical formulations that we have been making relate to 
the problems related to signal classification that were initially discussed. 

Let X be the space of continuous real-valued functions defined on [0, l] n 
with || • || the usual sup norm. Let k and r be positive constants and let Lip(/c) 
denote the subset of X consisting of the elements of X that satisfy a Lipschitz 
condition: |a;(a) — x(6)| < A;|a — b\ for all a and b. This is a typical way to deal 
with a good class of nonlinear functions. Let xx 2 , . . . , x m be distinct elements 
of Lip(/c) and let Cj = {x € Lip(/c): ||x — £j|| < r} for each j = 1, 2, . . . , m. 

Now assume that r < (1/2) min^- ||f — £j||. It is clear that the Cj are 
pairwise disjoint if this condition is satisfied. Since each Cj is a closed bounded 
subset of X that is equicontinuous on [0, l] n , we get a result thanks to the 
Arzela-Ascoli theorem (see [15]) showing that the Cj are compact. As we have 
shown earlier, since the Cj are compact and pairwise disjoint, the union |J_, Cj 
is also compact. 

We now introduce a theorem in [20] without the proof given there. 

Theorem 19 Let X denote the normed linear space of JR-valued continuous 
functions on X := [0, l] n , with the usual max norm. Let g 6 X*, and let e > 0. 
Then there are points a^, . . . , a p el, points Ci, . . . ,Cp £ M, and a q 6 X such 
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that 



p 

SU P | <7 0*0 "E^NI < 6 
xec j = i 

and 

sup|< 7 (:r) — / q(a)x(a)da\ < e. 
x£C Ji 

This theorem shows that a classifier can be found in this case using a 
simple sampling and summing operation or an integration. It applies directly 
to our example at hand since we are working on [0, l] n . We now know that it 
is possible to classify the signals in our example using the structure in Figure 
6 where the functional yj performs the sampling and summing or integration 
operation 

This problem is very applicable to the examples discussed earlier. If n = 
1, 2, or 3, we are classifying continuous signals in one, two, or three variables. 
This is the kind of sensor input that we might have in the automatic target 
identification and pattern recognition examples that were mentioned earlier. 

Conclusions 

We have described a specific neural network structure that is capable of 
solving certain classification problems. This structure has the form of a single 
hidden layer feedforward neural network and therefore possesses the advantages 
of neural networks that were mentioned above. It has a simple framework that 
is easily built in hardware or simulated in software. 

It is important to note that there are limitations to the methods pre- 
sented here. All of the proofs are existence proofs. They guarantee that a 
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solution is possible and in some cases give a general idea on how it might be ac- 
complished. For example, we have seen how certain classes of functions such as 
sigmoids and polynomials are capable of being used as the activation functions 
(the Uj ) in a classifying neural network. What has not been determined is the 
number of nodes needed. We can only say that classification is possible with a 
finite number of nodes. Further, we have not given a certain method of finding 
the weights Cj in Figure 6. This is typically what we referred to as training the 
neural network. 

In spite of these shortcomings, we have succeeded in providing a general 
framework capable of studying the important problem of signal classification. 
We have accomplished this by using well-known theorems dealing with approx- 
imation. This area of research is fairly new and has proven extremely useful so 
far, and interest in it will continue to grow in the future. 
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